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SPECIFICATION 



Method, Apparatus, Computer program, Computer system and 
Computer-readable storage for Representing and 
Searching for an Object in an Image 

Technica l Fif^lH 
The present invention relates to the representation of 
an object appearing in a still or video image," such as an 
image stored in a multimedia database, especially for 
searching purposes, and to a method and apparatus for 
searching for an object using such a representation. 

Backaroiinri Ari- 

In applications such as image or video libraries, it is 
desirable to have an efficient representation and storage of 
the outline or shape of objects or parts of objects 
appearing in still or video images. A known technique for 
shape-based indexing and retrieval uses Curvature Scale 
Space (CSS) representation. Details of the CSS 
representation can be found in the papers "Robust and 
Efficient Shape Indexing through Curvature Scale Space" Proc. 
British Machine Vision conference, pp 53-62, Edinburgh, UK, 
1996 and "Indexing an Image Database by Shape Content using 
Curvature Scale Space" Proc. lEE Colloquium on Intelligent 
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Databases, London 1996, both by F, Mokhtarian, S. Abbas i and 
J. Kittler, the contents of which are incorporated herein by 
reference. 

The CSS representation uses a curvature function for 
the outline of the object, starting from an arbitrary point 
on the outline. The curvature function is studied as the 
outline shape is evolved by a series of deformations which 
smooth the shape. More specifically, the zero crossings of 
the derivative of the curvature function convolved with a 
family of Gaussian filters are computed. The zero crossings 
are plotted on a graph, known as the Curvature Scale Space, 
where the x-axis is the normalised arc-length of the curve 
and the y-axis is the evolution parameter, specifically, the 
parameter of the filter applied. The plots on the graph 
form loops characteristic of the outline. Each convex or 
concave part of the object outline corresponds to a loop in 
the CSS image. The co-ordinates of the peaks of the most 
prominent loops in the CSS image are used as a 
representation of the outline. 

To search for objects in images stored in a database 
matching the shape of an input object, the CSS 
representation of an input shape is calculated. The 
similarity between an input shape and stored shapes is 
determined by comparing the position and height of the peaks 
in the respective CSS images using a matching algorithm. 
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A problem with the known CSS representation is that the 
peaks for a given outline are based on the curvature 
function which is computed starting from an arbitrary point 
on the outline. If the starting point is changed, then 
there is a cyclic shift along the x-axis of the peaks in the 
CSS image. Thus, when a similarity measure is computed, all 
possible shifts need to be investigated, or at least the 
most likely shift. This results in increased complexity in 
the searching and matching procedure. 

Accordingly the present invention provides a method of 
representing an object appearing in a still or video image, 
by processing signals corresponding to the image, the method 
comprising deriving a plurality of numerical values 
associated with features appearing on the outline of an 
object starting from an arbitrary point on the outline and 
applying a predetermined ordering to said values to arrive 
at a representation of the outline. Preferably, said values 
are derived from a CSS representation of said outline, and 
preferably they correspond to the CSS peak values. 

It has been found that by applying a transformation, 
especially to CSS values, as in the invention, object 
retrieval performance is improved. 
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Disclosure of Invention 
A method of representing an object appearing in a still 

or video image, by processing signals corresponding to the 

image set forth in claim 1, the method comprises deriving a 

plurality of numerical values representing features 

appearing on the outline of an object and applying a scaling 

or non-linear transformation to said values to arrive at a 

representation of the outline. 

In a method set forth in claim 2, the numerical values 
reflect points of inflection on the outline. 

A method set forth in claim 3 comprises deriving a 
curvature scale space representation of the outline by 
smoothing the outline in a plurality of stages using a 
smoothing parameter resulting in a plurality of outline 
curves, using values for feature points on each outline 
curve to derive curves characteristic of the original 
outline, and selecting the co-ordinates of peaks of said 
characteristic curves, wherein said transformation is 
applied to peak co-ordinate values. 

In a method set forth in claim 4, the feature points 
relate to the curvature of each outline curve. 

In a method set forth in claim 5, the feature points 
relate to the maxima and minima of the curvature of the 



4 




outline curves. 

A method of representing an object appearing in a still 
or video image, by processing signals corresponding to the 
image set forth in claim 6, the method comprises deriving a 
curvature scale space representation of the object outline, 
selecting co-ordinates for peaks in the curvature scale 
space, and applying a non-trivial transformation to peak co- 
ordinate values to arrive at a representation of the object 
outline. 

In a method set forth in claim 7, the transformation is 
applied to the co-ordinate values corresponding to a 
smoothing parameter in the CSS representation. 

In a method set forth in claim 8, the transformation is 
applied to the co-ordinate values corresponding to an arc- 
length parameter along the outline. 

In a method set forth in claim 9, the transformation is 
a scaling transformation. 

In a method set forth in claim 10, the transformation 
is a non-linear transformation. 

In a method set forth in claim 11, the transformation 
is in the form of z' = a pow (z,b) + c, where a, b and c are 
constants and pow (2,b) denotes z to the power b. 
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In a method set forth in claim 12, b is greater than 
zero and less than 1. 

In a method set forth in claim 13, b is in the range of 
0.25 <= b <=0.75. 

In a method set forth in claim 14, b = 0.5. 

A method for searching for an object in a still or 
video image by processing signals corresponding to images 
set forth in claim 15, the method comprises inputting a 
query in the form of a two-dimensional outline, deriving a 
descriptor of said outline using a method as claimed in any 
one of claims 1 to 10, obtaining a descriptor of objects in 
stored images derived using a method as claimed in any one 
of claims 1 to 10 and comparing said query descriptor with 
each descriptor for a stored object, and selecting and 
displaying at least one result corresponding to an image 
containing an object for which the comparison indicates a 
degree of similarity between the query and said object. 

An apparatus set forth in claim 16 is adapted to 
implement a method as claimed in any one of claims 1 to 15. 

A computer program set forth in claim 17 implements a 
method as claimed in any one of claims 1 to 15. 
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A computer system set forth in claim 18 is programmed 
to operate according to a method as claimed in any one of 
claims 1 to 15. 

A computer-readable storage medium set forth in claim 
19 stores computer-executable process steps for implementing 
a method as claimed in any one of claims 1 to 15. 

A method of representing objects in still or video 
images set forth in claim 2 0 is described with reference to 
the accompanying drawings . 

A method of searching for objects in still or video 
images set forth in claim 21 described with reference to the 
accompanying drawings . 

A computer system set forth in claim 22 is described 
with reference to the accompanying drawings . 



Fig. 1 is a block diagram of a video database system; 

Fig. 2 is a drawing of an outline of an object; 

Fig. 3 is a CSS representation of the outline of Fig. 



Fig. 4 is a diagram illustrating the representation of 



Brief pejscription of thg Drawings 



2; 



a shape; 
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Fig. 5 is a drawing of the shape of an object; 
Fig. 6 is a CSS representation of the shape of Fig. 5; 
Fig. 7 is a transformed representation of the shape of 
Fig. 5; and 

Fig. 8 is a block diagram illustrating a searching 
method. 



8 



Best Mode for Carryi ng Out: the Invention 

First embodiment 

Fig. 1 shows a computerised video database system 

according to an embodiment of the invention. The system 

includes a control unit 2 in the form of a computer, a 

display unit 4 in the form of a monitor, a pointing device 6 

in the form of a mouse, an image database 8 including stored 

still and video images and a descriptor database 10 storing 

descriptors of objects or parts of objects appearing in 

images stored in the image database 8. 

A descriptor for the shape of each object of interest 
appearing in an image in the image database is derived by 
the control unit 2 and stored in the descriptor database 10. 
The control unit 2 derives the descriptors operating under 
the control of a suitable program implementing a method as 
described below. 

Firstly, for a given object outline, a CSS 
representation of the outline is derived. This is done 
using the known method as described in one of the papers 
mentioned above. 

More specifically, the outline is expressed by a 
representation ^ = {(x(u), y(u), ue [0, 1]} where u is a 
normalised arc length parameter. 
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The outline is smoothed by convolving W with an ID 
Gaussian kernel g(u, o) , and the curvature zero crossings of 
the evolving curve are examined at a charges . The zero 
crossing are identified using the following expression for 
the curvature: 



k(u,a) = 



where 

X(u, cr) = x(u) * g(u, <j) Y(u, a) = y(u) * g(u, cr) 

and 

(w, cf) = x{u) * (u, a) X„„ (w, cr) = x(u) * (w, cr) 

Here * represents a convolution and the subscripts 
represent derivatives . 

The number of curvature zero crossings changes as a 
changes^ and when o is sufficiently high ^ is a convex curve 
with no zero crossings. 

The zero crossing points are plotted on a graph , known 
as the CSS image space. This results in a plurality of 
characteristic curves. The peaks of the characteristic 
curves are identified and the corresponding co-ordinates are 
extracted and stored. In general terms, this gives a set of 
n co-ordinate pairs [(xl,yl), (x2,y2), . . . . (xn,yn) ] , where n 
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is the number of peaks, and xi is the arc-length position of 
the ith peak and yi is the peak height. 

In this embodiment, a binomial filter with coefficients 
(1/4, 1/2, 1/4) is used as an approximation of a Gaussian 
filter with some reduction of computational complexity. The 
reduction in computational complexity results from 
convenient filter coefficients which can be efficiently 
implemented on a DSP or a general-purpose processor, 

The peak values, or in other words, the y-component 
values for the peaks, are then processed further. More 
specifically, the y values are transformed using the 
transformation : 

y ' = a pow ( y , b ) + c ( 1 ) 

where pow(y,b) denotes y to the power b. 

This results in a new. set of peak values [(xl, y'l), 
(x2, y'2) ... (xn, y'n)], which values are stored in the 
descriptor database as a descriptor of the outline. 

As a specific example, the outline shown in Fig. 2 
results in a CSS image as shown in Fig. 3. Details of the 
co-ordinates of the peaks of the curves in the CSS image are 
given in Table 1 below. 
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Peak Index 


X 


Y 


1 


0.124 


123 


2 


0.68 


548 


3 


0.22 


2120 


4 


0.773 


1001 


5 


0.901 


678 



Table 1. 



The transformation given above is then applied, with a 
=6, b = 0.5 and c = 0. In other words, the square root of 
the original y value is taken and multiplied by a constant. 
This results in the following values : 



Peak Index 


X 


Y 


1 


0.124 


67 


2 


0.68 


140 


3 


0.22 


276 


4 


0.773 


190 


5 


0.901 


156 



Table 2. 



Here, the values are rounded to the nearest integer, 
but other roundings can be used. 



Second embodiment 

Another example is shown in Fig. 4. 

Fig. 5 shows another example of an object shape, in 
this case a turtle. Fig. 6 shows the CSS peaks for the 
shape of Fig. 5. Fig. 7 shows the transformed peaks of Fig. 
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6, using the transformation given in equation (1) above, 
with a = 6.5, b = 0.5 and c = 0. 

The stored descriptors are used for searching purposes. 
The user initiates a search by drawing an object outline on 
the display using the pointing device (step 510). The 
control unit 2 then derives a CSS representation of the 
input outline (step 520) and then applies the transformation 
to the y values as described above (step 530). The 
resulting descriptor of the input outline is then compared 
with each stored descriptor in the descriptor database, 
known in the following as the model descriptors, using a 
known matching procedure (step 540). 

The matching comparison is carried out using a 
suitable algorithm resulting in a similarity measure for 
each descriptor in the database. A known matching 
algorithm such as described in the above-mentioned papers 
can be used. That matching procedure is briefly described 
below. 

Given two closed contour shapes, the image curve and 
the model curve ^m and their respective sets of peaks 
{ ( xil , y il ) , ( xi2 , y 12 ),..,( xin , y in ) } and { ( xml , yml ) , ( xm2 , ym2 ) , 
(xmn,ymn) } the similarity measure is calculated- The 
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similarity measure is defined as a total cost of matching 
of peaks in the model into peaks in the image. The matching 
which minimises the total cost is determined using a dynamic 
programming. The algorithm recursively matches the peaks 
from the model to the peaks from the image and calculates 
the cost of each such match. Each model peak can be matched 
with only one image peak and each image peak can be matched 
with only one model peak. Some of the model and or image 
peak may remain unmatched, and there is an additional 
penalty cost for each unmatched peak. Two peaks can be 
matched if their horizontal distance is less then 0.2. The 
cost of a match is the length of the straight line between 
the two matched peaks. The cost of an unmatched peak is its 
height . 

In more detail the algorithm works by creating and 
expanding a tree-like structure, where nodes correspond to 
matched peaks : 

1. Create starting node consisting of the largest 
maximum of the image (xik, yik) and the largest maximum of 
the model (xir,yir). 

2. For each remaining model peak which is within 80 
percent of the largest maximum of the image peaks create an 
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additional starting node, 

3* Initialise the cost of each starting node created 
in 1 and 2 to the absolute difference of the y-coordinate of 
the image and model peaks linked by this node. 

4. For each starting node in 3, compute the CSS shift 
parameter alpha, defined as the difference in the x 
(horizontal) coordinates of the model and image peaks 
matched in this starting node. The shift parameter will be 
different for each node. 

5. For each starting node, create a list of model 

, peaks and a list of image peaks. The list hold information 
which peaks are yet to be matched. For each starting node 
mark peaks matched in this node as "matched", and all other 
peaks as "unmatched". 

6. Recursively expand a lowest cost node (starting 
from each node created in steps 1-6 and following with its 
children nodes) until the condition in point 8 is fulfilled. 
To expand a node use the following procedure: 

7. Expanding a node: 

If there is at least one image and one model peak 
left unmatched: 

select the largest scale image curve CSS maximum 
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which is not matched (xip^yip) . Apply the starting node 
shift parameter (computed in step 4) to map the selected 
maximum to the model CSS image - now the selected peak has 
coordinates (xip-alpha, yip). Locate the nearest model curve 
peak which is unmatched (xms^yms). If the horizontal 
distance between the two peaks is less then 0,2 (i.e: | xip- 
alpha- xms | < 0.2)/ match the two peaks and define the cost 
of the match as the length of the straight line between the 
two peaks. Add the cost of the match to the total cost of 
that node. Remove the matched peaks from the respective 
lists by marking them as "matched". If the horizontal 
distance between the two peaks is greater than 0.2 ^ the 
image peak (xip^yip) cannot be matched. In that case add its 
height yip to the total cost and remove only the peak 
(xip,yip) from the image peak list by marking it as 
"matched" . 

Otherwise (There are only image peaks or there are 
only model peaks left unmatched): 

Define the cost of the match as the height of the 
highest unmatched image or model peak and remove that peak 
from the list. 

8. If after expanding a node in 7 there are no 
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unmatched peaks in both the image and model lists ^ the 
matching procedure is terminated. The cost of this node is 
the similarity measure between the image and model curve. 
Otherwise, go to point 7 and expand the lowest cost node. 

The above procedure is repeated with the image curve 
peaks and the model curve peaks swapped. The final matching 
value is the lower of the two. 

As another example, for each position in the ordering, 
the distance between the input x value and the corresponding 
model X value and the distance between the input y value and 
the corresponding model y value are calculated. The total 
distance over all the positions is calculated and the 
smaller the total distance, the closer the match. If the 
number of peaks for the input and the model are different, 
the peak height for the leftovers is included in the total 
distance. 

The above steps are repeated for each model in the 
database (step 480). 

The similarity measures resulting from the matching 
comparisons are ordered (step 490) and the objects 
corresponding to the descriptors having similarity measures 
indicating the closest match (i.e. here the lowest 
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similarity measures) are then displayed on the display unit 
4 for the user (step 500). The number of objects to be 
displayed can be pre-set or selected by the user. 

Third embodiment 

An alternative embodiment will now be described. This 

embodiment is the same as the previous embodiment^ except 

that a different transformation is used. More specifically, 

the y values are transformed using the transformation: 

y = ao + a^ y. 

In other words, a linear, scaling, transformation is 
applied . 

Here, ao = 41, a^ = 0.19. 

In a variation, ao = 0 and a^ = 0.27. 

Different values of ap and a^ can be used as appropriate. 

The searching and matching procedure is essentially as 
described in the previous embodiment. 

It has been found that applying a transformation, 
especially a linear transformation involving scaling or a 
non-linear transformation, as described above results in a 
descriptor which is less sensitive, eg to changes of shape 
outline within an object class, which consequently results 
in improved retrieval of objects. 
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In the embodiments described above, the transformation 
is applied to the CSS values before storing in the 
descriptor database 10. Alternatively, the CSS values can 
be stored in the database 10, and the transformation carried 
out as part of the searching process, before the matching 
procedure is performed. 

In the described embodiments, the transformations are 
applied to the y-co-ordinate values. However, they may be 
applied to the x-co-ordinate values. 
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Industrial Applicability 
A system according to the invention may, for example, 

be provided in an image library. Alternatively, the 

databases may be sited remote from the control unit of the 

system, connected to the control unit by a temporary link 

such as a telephone line or by a network such as the 

internet. The image and descriptor databases may be 

provided, for example, in permanent storage or on portable 

data storage media such as CD-ROMs or DVDs . 

Components of the system as described may be provided 

in software or hardware form. Although the invention has 

been described in the form of a computer system, it could be 

implemented in other forms, for example using a dedicated 

chip. 

Specific examples have been given of methods of 
representing a 2D shape of an object, here, using CSS 
representation, and of methods for calculating values 
representing similarities between two shapes but any 
suitable such methods can be used. 

The invention can also be used, for example, for 
matching images of objects for verification purposes, or for 
filtering. 
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